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Abstract 

This  paper  examines  the  characteristics  of  three  types  of  random  error  measures  in  the 
presence  of  negative  power  law  (neg-p)  noise:  (a)  the  observable  residual  error  after  removing 
an  estimate  of  an  information  containing  causal  function  from  data ,  (b)  the  jitter ,  the  residual 
error  with  additional  highpass  (HP)  filtering ,  and  (c)  Mth -order  difference  (A)  variances ,  such 
as  the  Allan  variance  (lst-order  A-variance  of  the  fractional  frequency  error  y(t))  and  the 
Hadamard-Picinbono  variance  (2nd -order  A-variance  of  y(t)).  Measures  (b)  and  (c)  are  used  to 
mitigate  perceived  divergence  problems  in  the  mean  square  (MS)  of  Measure  (a)  due  to  the 
presence  of  neg-p  noise .  This  paper  proves  that  this  perception  is  wrong;  it  shows  that  the  MS  of 
Measure  (a)  converges  in  the  presence  of  neg-p  noise  by  demonstrating  that  extracting  a 
statistically  optimal  estimate  of  the  causal  behavior  from  data  HP  filters  the  noise  in  the 
measure.  It  is  further  shown  that  the  order  of  this  noise  HP  filtering  increases  with  the 
complexity  of  the  model  function  used  to  estimate  the  causal  behavior  in  the  data.  Thus ,  if  one  is 
free  to  choose  the  complexity  of  the  model  function ,  the  MS  observable  residual  error  is 
guaranteed  to  converge  for  any  negative  power  in  the  noise  PSD.  Because  of  this ,  it  is  shown 
that  the  jitter  can  be  defined  simply  as  the  observable  residual  error  without  additional  HP 
filtering ,  making  the  jitter  and  residual  error  the  same  error  measure.  This  paper  finally  shows 
that  an  Mth-order  A-variance  is  also  a  measure  of  the  MS  of  the  observable  residual  error  for 
any  number  of  data  samples  when  the  model  function  is  an  (M-l)th -order  polynomial.  This 
completes  the  equivalence ,  showing  that  Measures  (a),  (b),  and  (c)  all  measure  the  same  kind  of 
error  when  the  model  function  for  the  causal  behavior  is  a  polynomial.  The  consequences  of 
this  equivalence  are  then  explored.  Among  these  is  a  physical  explanation  for  the  fact  that  the 
Allan  variance  is  sensitive  to  frequency  drift ,  while  the  Hadamard-Picinbono  variance  is  not. 


INTRODUCTION 

Different  approaches  for  specifying  random  time,  phase,  and  frequency  (TPF)  error  are  used  across  the 
electrical  engineering  (EE)  community.  As  shown  in  Figure  1,  these  approaches  are  as  follows. 
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Figure  1.  Residual  error  (left),  jitter  and  wander  (middle),  and  difference  variances  (right). 
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Observable  Residual  Error 


We  will  define  ,  the  mean  square  (MS)  observable  residual  error,  as  the  MS  of  the  difference 

between  a  set  of  N  data  samples  v(tn)  (over  an  observation  time  T)  and  vw  M(tn,A)  an  estimate  of  a  true 
causal  function  vc(t)  imbedded  in  the  data  [1].  The  parameter  M  in  g^_j(N,M)  relates  to  the  M  in 
vwM(tn,A)  and  will  be  explained  later.  a^_j(N,M)  is  an  important  random  error  statistic  in  system 

specification,  because  it  relates  directly  to  primary  performance  measures  in  many  systems.  These 
measures  include  the  signal-to-noise  ratio  (SNR),  the  bit  or  symbol  error  rate  (BER),  the  noise  power  ratio 
(NPR),  the  effective  number  of  bits  (ENOB),  and  the  multiplicative  noise  ratio  (MNR)  (signal  processing 
noise)  [2-8].  In  many  treatments,  the  terms  standard,  sample  [9],  and  MS  (RMS  for  deviate)  error  are 
used  when  a^_j(N,M)  is  intended,  but  we  will  use  the  neutral  term  residual  error,  because  it  does  not 

have  the  alternate  definitions  or  connotations  associated  with  these  other  terms.  Note  that  we  are  explicit 
in  stating  that  it  is  vw  M(tn,A) ,  the  estimate  of  the  causal  function,  that  is  removed  from  the  data,  not 

vc(t),  the  true  causal  function.  The  distinction  between  the  estimated  and  true  causal  functions  is  often 
glossed  over  in  treatments  of  residual  error,  but  this  distinction  will  be  important  later  in  our  discussion. 
Thus,  only  the  observable  residual  error,  based  on  causal  behavior  estimated  from  the  data  set  itself,  is 
directly  measurable  or  observable  from  a  set  of  data,  even  if  the  desired  error  measure  for  system 
specification  is  the  true  residual  error  [1]. 

vwM(tn,A) ,  the  model  function  used  to  estimate  vc(t),  will  be  considered  a  function  of  M  parameters 
represented  by  the  column  vector  A  =  (a0,a1  ,...aM_1)'  ('  is  the  matrix  transpose)  as  well  as  the  observation 
or  sample  time  tn.  Thus,  the  M  in  vw  M(t,A)  is  the  source  of  the  M  in  o^_j(N,M)  .  In  this  paper,  we  will 
adjust  A  in  vwM(tn,A)  to  obtain  a  statistically  optimal  estimate  of  vc(t)  by  utilizing  a  least-squares  fit 
(LSQF)  over  the  N  samples  of  data  v(tn)  [1],  where  we  will  assume  the  v(tn)  are  evenly  spaced  over  the 
observation  interval  T.  For  Gaussian  random  (though  divergent  neg-p)  noise,  such  an  LSQF  is  equivalent 
to  other  maximum  likelihood  methods  [1,10],  especially  if  we  allow  the  LSQF  to  be  weighted  [1]. 
Estimating  the  M  parameters  in  A  or  the  function  vw  M(t,  A)  from  the  data  is  equivalent  to  the  extraction 

of  causal  information  from  the  data,  hence  the  title  of  the  paper.  This  is  true  whether  the  information  is  a 
desired  product  of  the  system  or  is  just  another  error  parameter  that  impacts  system  performance,  such  as 
the  frequency  aging  of  an  oscillator. 

Jitter  and  Wander 

Jitter  and  wander  [4,11,12]  have  been  introduced  to  deal  with  perceived  divergence  problems  in  the 
residual  error  associated  with  the  presence  of  negative  power  law  (neg-p)  noise  [13].  Time  jitter  and 
wander  are  currently  defined  as  brick-wall  highpass  (HP)  and  lowpass  (LP)  filtered  variations  in  the  time 
error  x(t)  or  the  time  interval  error  TIE  with  a  crossover  frequency  fc  excluding  causal  frequency  offsets 
and  drifts  (and  implicitly  causal  time  offsets)  [4,11,12].  Thus,  jitter  and  wander  are  effectively  HP  and 
LP  filtered  x(t)  residual  errors  after  removal  of  the  2nd-order  causal  behavior  in  x(t).  The  brick-wall  HP 
filtering  in  the  jitter  ensures  convergence  for  neg-p  noise  and  dumps  the  convergence  problem  into  the 
wander,  which  is  usually  ignored  in  discussions  of  jitter. 

For  the  purposes  of  the  discussion  that  follows,  we  note  that  power  law  (neg-p)  noise  for  a  general 
variable  v(t)  is  wide-sense  stationary  noise  with  a  single  sideband  power  spectral  density  (PSD) 
Lv(t)  oc  fp  [13].  Here,  we  are  departing  from  [13]  in  utilizing  the  SSB  PSD  Lv(t)  rather  than  the  double 
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sideband  or  single-sided  PSD  Sv(t) ,  hence  the  use  of  Lv(t)  to  avoid  confusion  with  the  DSB  form  Sv(t) . 
Neg-p  noise  will  designate  such  power  law  noise  when  p  <  0  (usually  -1,-2, -3, -4  [13]),  and  the  p  =  0 
case  is  generally  called  white  v-noise.  We  also  note  that  v(t)  will  be  generally  considered  a  reference  to 
TPF  variables,  such  as  x(t)  the  time  error,  4>(t)  the  phase  error,  or  y(t)  the  fractional  frequency  error  [4,13], 
but  results  in  this  paper  in  terms  of  v(t)  will  apply  any  variable  without  limitation. 


The  problem  associated  with  the  use  of  jitter  and  wander  is  that  the  relationship  of  fc  to  natural  filtering 
parameters  in  the  system  under  consideration  is  often  unclear.  The  ITU  arbitrarily  defines  fc  as  10  Hz  [4]. 
This  is  helpful  in  standardizing  producers  of  TPF  equipment,  but  bears  only  an  accidental  relationship  to 
parameters  in  general  user  systems.  The  IEEE  Broadcast  Technology  Society  (BTS)  [11]  and  the  Society 
of  Motion  Picture  and  Television  Engineers  (SMPTE)  [12]  relate  fc  to  the  loop  bandwidth  of  a  phase 
locked  loop  (PLL)  in  a  user  system.  This  is  helpful  for  users  with  PLLs  in  their  systems,  but  leave  users 
without  appropriate  PLLs  in  their  systems  in  doubt.  We  will  address  this  fc  relationship  problem  later  in 
this  paper. 

Mth-Order  Difference  (A)  Variances 


av,M(x)  dhe  Mth-order  difference  (A)  variances  [14],  is  a  generalization  of  the  Allan  or  two-sample 

variance  [13],  which  is  the  lst-order  A- variance  of  y(t)  and  the  zero-dead-time  Hadamard  variance  [15-18] 
or  Picinbono  variance  [19],  which  is  the  2nd-order  A- variance  of  y(t).  These  variances  are  considered 
stability  measures  and  are  used  because  of  their  excellent  convergence  properties  in  the  presence  of  neg-p 
noise  [13,14,16-19].  In  fact,  one  can  show,  for  any  M,  that  the  Mth-order  A- variance  of  v(t)  will  HP  filter 

Lv(f)  with  a  2Mth  zero  at  f=0  [14].  As  indicated  in  Figure  2,  cr^M(x)  is  given  by  [14] 

aJ)M(x)  =  ^MS{A(i)Mv(tn)}  (1) 


where  A(x)  is  the  forward  difference  operator  over  the  separation  interval  x  defined  by  [14] 

A(x)v(t)  =  v(t  +  x)  -  v(t) 

and  the  normalization  constant  is  defined  as  [14] 

M  f  A/rl 

- 


(2) 

(3) 


j-  A(x)2v(U 

■  A(x)v(tn)  -^-A(x)v(tn+x) 


t 


<-MS  of  A(x)Mv(tn)  over  T=  N-x0^ 


Figure  2.  Mth-order  difference  variances. 


We  note  that  this  definition  of  XM  makes  all  M  orders  of  o^M(x)  equal  for  uncorrelated  white  noise  [14]. 
The  MS  operation  in  (1)  is  over  N  samples  at  tn  =  t0  +  nx0  (n  =  0  to  N - 1 ),  so  that  the  samples  again  fall 
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within  a  total  observation  interval  T  =  Nt0  ,  where  t0  is  the  sampling  interval.  There  is  more  than  one 
way  to  take  the  MS  over  the  data,  and  we  will  discuss  this  later  in  the  Mathematical  Niceties  Section. 

The  Highpass  Filtering  of  Noise  Due  to  Information  Extraction 


The  common  wisdom  is  that  the  rigorous  form  of  the  MS  residual  is  divergent  in  the  presence  of  negative 
power  law  (neg-p)  noise.  This  paper  will  show  that  this  common  wisdom  is  wrong,  because  the  process 
of  optimally  estimating  the  causal  function  HP  filters  the  noise  in  the  residual  error.  We  will  also  show 
that  the  order  of  this  HP  filtering  increases  with  the  complexity  of  the  information  extracted  from  the  data, 
as  expressed  by  the  number  of  parameters  M  in  vw  M(t,A) .  Thus,  if  one  is  free  to  choose  the  complexity 

of  vw  M(t,  A) ,  the  residual  error  is  guaranteed  to  converge  for  any  negative  power  in  Lv(f) . 

This  HP  filtering  of  the  noise  in  the  observable  residual  error  allows  one  to  eliminate  additional  filtering 
in  the  definitions  of  jitter  and  wander.  Thus,  we  will  redefine  the  jitter  simply  as  the  observable  residual 
error  without  such  additional  filtering.  Later  in  the  paper,  we  will  show  that  the  needed  HP  filtering  is 
supplied  by  a  combination  of  the  LSQF  process  and  other  system  related  filtering.  The  wander  will  then 
be  redefined  as  the  difference  between  the  estimated  and  true  causal  functions,  which  we  will  show  is  LP 
filtered  by  the  same  system  related  filtering  processes  as  the  jitter.  These  new  definitions  solve  the 
relationship  problem  between  fc  and  system  parameters  for  general  user  systems.  They  also  allow  one  to 
generalize  the  concepts  of  jitter  and  wander  to  include  variations  in  any  variable  and  the  removal  of  any 
type  of  causal  behavior. 

Mth-Order  Difference  Variances  as  Measures  of  Residual  Error 

The  paper  will  also  show  that  g^m(T/M)  is  a  measure  of  a^_j(N,M)  for  any  N  when  va  M(t,A)  is  an 
(M  -  l)th  polynomial  and  the  MS  operation  for  g^_j(N,M)  uses  the  “unbiased”  estimator;  that  is,  it 
divides  the  sum  in  the  MS  operation  by  N  -  M  .  Unbiased  is  in  quotes  because  it  is  the  true  unbiased 
estimator  only  for  uncorrelated  white  noise  [20,21].  This  relationship  between  a^M(T/M)and 

ciy_j(N,M)  is  well-  known  for  the  Allan  or  two-sample  variance  Gy(x)  [13][23]  or  Gy_j(2,l),  which  has 

been  related  to  the  N-sample  variance  cjy(N,x,x)  [23,24]  or  cjy_j(N,l)  using  Allan-Bames  “bias”  functions 

[23,24].  We  will  generalize  this  argument  to  relate  g^m(T/M)  to  a^_j(N,M)  using  a  similar  bias 

function  concept,  except  that,  in  our  case,  we  will  hold  the  total  observation  time  T  constant  as  N  varies. 
Because  of  this  difference,  the  bias  function  relating  a^M(T/M)  to  a^_j(N,M)  is  much  less  dependent  on 

p  than  in  the  Allan-Barnes  case,  and  we  will  show  that  there  is  a  simple  approximate  form  for  our  bias 
functions  independent  of  p. 

This  brings  us  full  circle,  showing  that  residual  variances,  jitter,  and  A-variances  measure  the  same  type 
of  random  error  when  a  polynomial  is  used  to  model  the  causal  behavior  of  the  data.  This  relationship 
between  a^M(T/M)  and  a^_j(N,M)  also  yields  a  physical  interpretation  for  the  well-known 

insensitivities  of  g^m(x)  to  (M  -  l)th  -order  or  lower  polynomial  causal  aging  behavior  [25,26]. 
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MATHEMATICAL  NICETIES 

Data,  Noise,  and  Residual  Error  Model 

Figure  3  shows  the  model  we  will  use  for  the  N  data  samples  v(tn)  and  the  various  types  of  residual 
errors  we  will  be  defining.  Let  us  represent  the  total  (continuous)  data  variable  v(t)  as 

v(t)  =  vc(t)  +  vp(t)  (4) 

where  vc(t)  is  the  true  causal  function  imbedded  in  the  data  and  vp(t)  is  the  true  random  noise.  We  will 
assume  that  v(t)  and,  thus,  vc(t)  and  vp(t) ,  have  been  pre-filtered  by  a  system  response  function  written 
as  hs(t)  in  the  time  domain  and  Hs(f)  in  the  frequency  domain  such  that 

/•+00 

v(t)=[dths(t-t')vin(t')  (5) 

J—  00 

where  vin(t)  is  the  variable  prior  to  system  filtering  [5,6,27].  Hs(f)  describes  the  filtering  action  of  the 
system  on  the  variable  over  and  above  any  filtering  introduced  by  the  LSQF.  It  is  well-known  that  such  a 
filter  acting  on  the  pre-filtered  noise  variable  corresponding  to  vp(t)  will  produce  an  output  PSD  of 

|  Hs(f)  |2  Lv(t)  when  the  pre-filtered  PSD  of  the  noise  variable  is  Lv(t)  [10].  Thus,  the  PSD  of  the  post- 
filtered  vp(t)  will  be  written  as  |Hs(f)|2  Lv(t)  in  this  paper,  so|Hs(f)|2  explicitly  appears  in  spectral 
formulas. 


True  Noise 
vp(t) 


*w,M 


(t,A) 


Observable  Res  Error  Vj(tn)  y 

- - r— 

- rL  V-V'"" —  True 

X  T  T  Causal  Fn 

Data  v(tn)  True  Fn  Error  vw(tn)  vc(t) 


N  Samples  over  T  =  N  x0 


Figure  3.  Data  model  and  residual  errors. 


We  will  further  assume  the  model  function  vw  M(t,  A)  is  linear  in  A,  so  it  can  be  represented  by 

M-l 

Vw,M(tn>A)  =  X  amUm(tn)  (6) 

m=0 

where  the  um(t)  are  a  set  of  (not  necessarily  orthogonal)  basis  functions.  An  important  class  of 
vw  M(t,  A)  that  we  will  discuss  consist  of  polynomials,  for  which  um(tn)  =  tnm . 

The  observable  residual  error  Vj(tn)  is  thus 

vj(tn)  =  v(tn)-vw,M(tn»A)  .(7) 

and  its  MS  or  variance  is  given  by 

atj=MS{Vj(tn)}.  (8) 
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In  this  paper,  we  will  define  Vj(tn)  above  as  the  jitter.  The  true  function  error  is  given  by 


vw(tn)  =  vw>M(t„,A)-vc(tn)  (9) 

and  its  MS  or  variance  is  given  by 

c>2_w  =MS{vw(tn)} .  .(IQ) 

For  this  paper,  we  will  define  vw(tn)  above  as  the  wander,  which  one  can  see  is  not  directly  observable 
variable,  since  one  must  know  either  vc(t)  or  vp(t)  to  generate  it  from  the  data.  More  will  be  said  about 
this  later. 

We  note,  from  (4)  and  (7)  and  vw(tn) ,  that  we  can  write 

Vp(t)  =  Vj(t)  +  Vw(t).  (11) 

Thus,  vj(tn)  and  vw(tn)  sum  together  to  form  the  total  noise  just  as  conventional  jitter  and  wander  do. 
Later  in  the  paper,  we  will  show  that  vj(tn)  and  vw(tn)  have  HP  and  LP  properties  similar  to  those  in  the 
old  definitions  [44U2],  but  with  their  HP  and  LP  properties  completely  determined  by  Hs(f)  and  the 
LSQF  estimation  process. 

Mth-Order  Differences 


We  note  that  A(t)m  v(tn)  can  be  written  in  expanded  form  as  [27] 

M 

A(x)Mv(tn)  =  ^c(M,m)v(tn  +mx) 

m=0 


where 


c(M,m) 


(~l)M~mM! 
m!(M-m)! ' 


(12) 


(13) 


Thus,  X  M  in  (3)  can  be  written  as 

M 

^M=£c(M,m)2.  (14) 

m=0 

As  an  additional  note,  (3)  and  (14)  correct  a  typographical  error  in  [27]  in  which  the  upper  limit  of  the 
sum  was  mistakenly  written  as  M  - 1 . 

Definition  of  Mean  Square  Operation 

The  MS  operation  for  statistics  of  variances  will  be  defined  as 

N-l 

MS{z(tn)}  =  jyn|z(tn)|2.  (15) 

n=0 

This  can  represent  various  types  of  mean  square  operations  depending  on  the  values  of  n  and  the  form 
of  hs(t) .  The  unweighted  biased  MS  is,  thus,  given  by  =  N_1,  and  the  “unbiased”  MS  is  given  by 
^n=(N-M)_1.  One  can  also  use  various  combinations  of  and  hs(t)  to  represent  overlapping, 
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modified,  or  total  averaging  MS  operations  [11,14,26].  In  theoretical  variance  representations  of  the  MS 
operation  (see  Appendix  A),  we  will  assume  the  ensemble  average  or  expectation  operator  E{...}  has 
been  applied  in  addition  to  the  MS  operation  defined  in  (15). 

THE  HIGHPASS  FILTERING  OF  NOISE  DUE  TO  INFORMATION 
EXTRACTION 


In  this  section,  we  will  first  explain  intuitively  and  graphically  how  and  why  the  LSQF  estimation  of 
vc(t)  using  vw  M(t,  A)  HP  filters  the  noise  in  the  residual  error;  then  we  will  formally  prove  this  assertion. 

For  the  intuitive  explanation,  consider  Figure  4,  where  we  show  FQSF  solutions  for  various  power-law 
noise  indices  p  .  For  p  =  0  (white  noise),  one  can  see  that  the  solution  behaves  in  the  classical  manner 

[1],  with  vwM(t,A)  closely  tracking  vc(t).  Using  classical  FSQF  theory  [1],  one  can  show  that 
vwM(t,A)  ^  vc(t)  with  av_w  -^0  as  N^co  as  long  as  the  bandwidth  of  the  data  is  such  that  the  data 
samples  remain  uncorrelated  for  any  value  of  N.  For  p  =  -2  and  -  4  (neg-p  noise),  however,  one  can  see 
there  are  significant  systematic  long-term  deviations  in  vw  M(t,  A)  from  vc(t)  for  the  large  N  case  shown. 

This  is  due  to  the  highly  correlated  nature  of  neg-p  noise.  In  fact,  using  FSQF  theory  for  correlated 
noise,  one  can  show  that  these  deviations  will  remain  non-zero  as  N  ^  oo  ,  because  vw  M(t,A)  will  track 

components  in  the  noise  with  Fourier  frequency  f  approximately  equal  or  less  than  fT  =1/T  (for  an 
unweighted  FSQF),  regardless  of  the  value  of  N.  (It  is  noted  that  T  is  fixed  as  N  is  varied  and,  for  a 
weighted  FSQF,  that  fT  =  1/Teff ,  where  Teff  is  determined  by  the  weights  as  well  as  the  total  time  T.) 
This  tracking  of  low-frequency  (FF)  noise  arises  because  of  the  well-known  inability  of  an  FSQF  to 
distinguish  between  causal  behavior  and  noise  that  is  correlated  over  the  measurement  interval  [1].  It  is 
this  tracking  that  causes  the  noise  to  be  HP  filtered  in  vj(t)  and  ■  with  an  HP  cutoff  knee  at 

approximately  fT .  One  should  point  out  that  this  FF  tracking  also  occurs  for  white  noise,  but  is  only 
apparent  for  neg-p  noise,  because  virtually  all  the  power  in  the  neg-p  noise  is  in  Fourier  components  with 
f  <  1/T  (for  any  value  of  T). 


Figure  4.  Simulated  least  squares  solutions  for  various  p  values. 


One  can  write  a  spectral  representation  for  a^_j  as 

/•  oo 

°v-j  =  2j0  Lv(f)|Hs(f)|2Kv-j(f)df  +  oic .  .(16) 

The  left  term  in  (16)  is  a  previously  published  spectral  integral  [5,6,27]  that  describes  the  Lv(f)- 
dependent  part  of  .  In  this  paper,  a  new  term,  a^_c ,  has  been  added  in  (16)  to  include  the  effects  of 
model  error.  This  term  arises  when  the  complexity  of  vw  M(t,A)  is  not  sufficient  to  follow  the  variations 
in  vc(t)  over  the  observation  interval  T  (See  Appendix  A  for  more  detail.).  In  the  integral  part  of  (16), 
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there  are  two  factors,  |  Hs(f)  |2  andKv_j(f) ,  that  can  HP  filter  Lv(f) .  As  discussed  in  the  Mathematical 
Niceties  Section,  Hs(f)  represents  the  explicit  filtering  action  of  the  system  under  consideration  on  the 
data  variable  v(t)  [5,6,27].  |Hs(f)|2  in  (16)  replaces  the  simple  LP  cut-off  fh  used  in  previous 

formulations  of  the  spectral  integral  to  model  the  system  [8].  |  Hs(f)  |2  is  a  more  accurate  representation 

of  a  system’s  specific  filtering  properties  than  fh  and  can  be  shown  to  have  HP  as  well  as  LP  behavior  for 
many  types  of  systems  [5,6,27].  The  importance  of  such  HP  filtering  from  |  Hs(f)  |2  is  that  it  helps  a2_j 
converge  in  the  presence  of  neg-p  noise  and,  in  fact,  in  and  of  itself  can  ensure  the  convergence  of  a2_j 
for  some  or  all  of  the  common  neg-p  values  [5,6,27]. 

A  classic  example  of  a  system  response  function  is  the  “delay”  Hs(f)  shown  in  Figure  5.  This  well- 
known  response  function  arises  when  one  mixes  a  signal  with  a  delayed  version  of  itself,  as  in  a  delay  line 
discriminator,  radar  system,  or  two-way  ranging  system  [27].  One  can  show  that  |  Hs(f)  |2=  4sin2(7ifrd) 
for  such  a  system  [27].  When  f  «i,  this  |  Hs(f)  |2  is  proportional  to  f 2 ,  which  by  itself  allows  a2_j  to 
converge  for  neg-p  noise  with  p  >  -2  . 


„  (D> - 

<|)(t)  Y<|>(t-Td) 

<x)-—  |Hs(f)|2xf2(f«1) 


Figure  5.  Delay  system  response  function. 


Kv_j(f)  is  a  spectral  kernel  that  describes  the  spectral  properties  of  the  LSQF  and  MS  generation  process 

independent  of  the  system  response  filtering  [6,27].  In  Appendix  A,  exact  formulas  are  derived  for 
Kv_j(f)  and  an  equivalent  Kv_w(f)  for  a2_w  in  terms  of  a  spectral  decomposition  of  the  vw  M(t,A)  basis 

functions,  |Hs(f)|2,  and  Lv(f).  Also  derived  in  Appendix  A  is  a  similar  kernel  for  g2_c  in  terms  of  a 
dual-frequency  PSD  for  the  nonstationary  vc(t) .  In  the  next  subsection,  we  will  prove  that 

Kv_j(f)  oc  f 2M  (fT  « 1)  when  va  M(t,  A)  is  an  (M-l)th-order  polynomial  (17) 

Kv_j(f)  oc  fp  (fT  «  1  and  p?>  2)  when  va  M(t,  A)  is  any  function  with  a  DC  component.  (18) 

Figure  6  shows  the  results  of  a  simulation  verifying  that  (17)  is  indeed  true  for  M  =  1  to  5  and  N  =  1000 . 
This  verifies  that  Kv_j(f)has  the  required  HP  filtering  properties  for  fT  « 1  to  ensure  that  a2_j  will 
converge  for  any  neg-p  value,  if  one  can  choose  the  form  of  vw  M(t,  A) . 

Proof  of  Noise  HP  Filtering  for  Residual  LSQF  Error 

Let  us  now  prove  assertions  (17)  and  (18).  We  will  prove  (17)  by  decomposing  the  data  v(tn)  into 
components 


V(tn)  =  ^]  vf(tn)+vc(tn) 


f 


(19) 
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where  the  vf  (t)  =  Vfej27rft  are  single-frequency  noise  components  that  sum  to  generate  the  total  noise 
v  (tn).  Because  of  the  linearity  of  va  M(t,A)  in  A  given  by  (6),  we  note  that  the  LSQF  solution 
vw  M(t,A)  for  the  total  input  can  be  decomposed  into  the  sum  of  LSQF  solutions  for  the  separate  vf  (tn) 
and  vc(tn)  inputs,  or 


(N=1000)  Log10(fT)  (Unweighted  LSQF) 

Figure  6.  Simulation  showing  Kv_j(f)xf2M  for  fT«l  for  an  (M-l)th-order  polynomial 
model  function  (and  an  unweighted  LSQF). 


vw,M(t>A)  =  X  Vw,M(t>A(f))  +  vw,M(t>A<C))  •  (20) 

f 

Also,  because  the  spectral  noise  components  vf(t)  for  a  wide-sense  stationary  noise  process  v  (t)  are 
uncorrelated  with  each  other  [10]  and  with  vc(tn) ,  we  can  use  (20)  to  write  c>2_j  as 


=  E{MS{|  Vj(tn)  |2}}  =  E{MS{|  Vj_c(tn)  |2}}  +  X  E{MS{|  Vj-fOn)  I2}}  (21) 

f 

where 


Vj-f(tn)  =  Vf  (tn)  -  vwM(tn,A(f))  Vj_c(tn)  =  vc(tn)-vW;M(tn,Aw) 


(c)> 


(22) 


and  where  E{..}  the  expectation  value  has  been  added  to  the  MS  operation  in  order  to  generate  a2_j  in 


terms  of  the  PSD  Lv(f)  (see  Appendix  A).  Thus,  since  (16)  is  just  the  infinitesimal  limit  of  (21),  the 
spectral  properties  of  Kv_j(f)  in  (16)  can  be  determined  by  considering  the  LSQF  properties  of  each 


E{MS{|  Vj_f  (tn)  |2}}  term  separately. 


To  do  so,  let  us  expand  Vfej27rft  using  the  well-known  Taylor  Theorem  as 

M-l  .  xxk  +  \\M 


Vf(t)  =  VfeJ2ltft»|;  (j2*f0-to))  +Vf 


k=0 


(j27Tf(t'-t0))P 
M! 


(23.) 
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where  t’  is  somewhere  in  [t0,t].  We  then  note  that  the  residual  error  Vj_f(tn)  and,  hence, 
E{MS{|  Vj_f  (tn)  |2}}  ,  would  be  zero,  if  vf  (t)  were  given  only  by  the  right-hand  finite  sum  term  in  (23). 
This  is  because  the  model  function  vw  M(t,A(f))  and  the  finite  sum  term  would  then  both  be  (M-l)th-order 
polynomials,  so  the  fitted  vw  M(t,A(f))  would  be  exactly  vf  (t) .  Thus,  when  the  Taylor  series  converges, 
the  value  of  E{MS{|  vj_f  (tn)  |2}}  must  be  proportional  to  the  square  magnitude  of  the  right-hand  term  in 
(23).  This  term  is  proportional  to  f2M  and,  therefore,  we  must  have  Kv_j(f)  ocf2M  for  f(t'-t0)«l  or 
fT  « 1 ,  which  is  just  (17). 

To  prove  (18),  we  note  that  a  DC  component  is  a  Oth-order  polynomial.  Therefore,  by  using  (17)  with 
M=l,  Kv_j (f)  must  be  at  least  proportional  to  f2  for  fT  «  1 . 

HP  and  LP  Properties  of  Jitter  and  Wander  Defined  as  Residual  LSQF  Errors 

Now,  let  us  discuss  the  HP  and  LP  properties  of  jitter  and  wander  defined  as  vj(tn)  and  vw(tn) .  The 

exact  HP  and  LP  properties  of  a2_j  and  g^_j  can  be  derived  from  the  formulas  for  Kv_j(f)  and  Kv_w(f) 

given  in  Appendix  A.  Here,  we  will  discuss  their  overall  nature.  As  discussed  in  the  previous 
subsections,  the  LQSF  causes  Vj(tn)  to  be  HP  filtered  with  a  knee  frequency  fT  =  1/T .  Similarly,  one 

can  show  that  the  LSQF  causes  vw(tn)  to  be  LP  filtered  with  the  same  knee  frequency.  This  LSQF 
filtering  at  fT  is  shown  in  the  left  side  of  Figure  7.  In  addition,  Hs(f)  filters  both  vj(tn)  and  vw(tn) 
equally,  since  Hs(f)  has  the  same  effect  on  all  variables.  In  the  left  side  of  Figure  7,  this  Hs(f)  filtering 
is  shown  parametrically  using  an  HP  knee  fi  and  a  LP  knee  fh.  Thus,  the  brick-wall  filtering  properties  of 
the  jitter  as  vj(tn)  are  determined  by  an  HP  knee  fc  that  is  the  higher  of  fT  and  fi  and  a  LP  knee  given  by 

fh  ,  which  are  purely  functions  of  system  parameters.  We  also  note  that  the  equivalents  of  conventional 
x-jitter  and  x-wander  [4,11,12]  are  xj(tn)  and  xw(tn)  with  a  2nd-order  time  error  polynomial  removed. 
This  guarantees  that  the  x-jitter  will  converge  for  p  >  -6  in  Lx(f)  without  any  help  from  Hs(f) ,  and  thus 
guarantees  the  convergence  of  the  x-jitter  for  all  the  neg-p  noise  components  normally  encountered. 


Figure  7.  Jitter  ( vj (tn ) )  and  wander  ( vw  (tn ) )  HP  &  LP  properties. 


The  right  side  of  Figure  7  shows  that  the  wander  will  disappear  when  T  — » oo  »  fT )  and  all  that  will 
remain  is  the  jitter,  if  the  HP  order  of  Hs(f)  is  sufficient  by  itself  to  overcome  the  pole  in  Lv(f)  —  That 
is,  if  Hs(f)  by  itself  guarantees  the  convergence  of  the  wander.  This  case  is  the  transition  to  stationary 
statistics,  because  the  total  Hs(f)  filtered  noise  variance  must  then  be  convergent  for  a  finite  bandwidth 
system.  Furthermore,  as  mentioned  previously,  these  definitions  of  jitter  and  wander  as  Vj(tn)  and 
vw(tn)  allow  these  concepts  to  be  generalized  to  any  type  of  causal  function  removal  and  any  variable. 
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What  to  Do  When  the  Residual  Error  Variance  Does  Diverge 

c>2_j  can  diverge  in  the  presence  of  neg-p  noise  when  the  problem  being  addressed  fixes  the  form  of  the 
vw  M(t,A) .  In  this  paper,  it  is  maintained:  (a)  that  such  a  divergence  is  an  indication  of  a  real  problem  in 

the  design,  specification,  or  analysis  of  the  system  under  consideration,  and  (b)  that  this  real  problem 
must  be  investigated  and  fixed,  not  sidestepped.  From  the  discussions  in  this  paper,  it  is  obvious  that  the 
HP  filtering  properties  of  cr2_j  are  fixed  by  vw  M(t,A),  the  cr2_j  measurement  interval  T,  and  Hs(f)  as 

given  by  the  system  specification  (or  problem  definition)  and  design.  Thus,  such  a  divergence  must 
indicate:  (a)  that  something  is  essentially  wrong  with  the  system  design  or  spec,  or  (b)  that  the  system  is 
okay,  but  faulty  analysis  generated  a  perceived  (non-essential)  divergence.  For  Case  (a),  the  system  itself 
has  to  be  changed  to  correct  the  problem,  and  for  Case  (b),  the  system  does  not  have  to  be  changed  to 
correct  the  problem,  just  the  improper  analysis. 


Phase  Lock  Loop 

§r£f*=rT*- 

Osc  Loop  Filt  — K~)VCO 

l 

Cycle  Slips 

\fb 

<l>  =  <l>out  ■  4>ref 

4> 

Figure  8.  Cycle  slips  in  a  lst-order  phase  lock  loop  due  to  f  3  noise. 


The  best  way  to  understand  how  to  deal  with  such  divergences  is  to  use  the  specific  example  of  an 
essential  divergence  shown  in  Figure  8.  Here,  a  lst-order  phase-lock  loop  (PLL)  is  operating  using  a 
reference  oscillator  with  f  3  phase  noise.  As  shown  in  the  figure,  such  a  PLL  will  cycle  slip  because  of 
the  f~3  noise  [27].  An  indication  of  these  cycle  slips  appears  in  the  linear  loop  analysis  as  a  divergence 
in  j  for  M  =  0  (no  vw  M(t,A)  removal),  where  §  is  defined  in  Figure  8.  diverges  in  the  linear 

analysis  because  |  Hs(f)  |2  for  a  lst-order  PLL  is  proportional  to  f2  for  f  « 1  and  -(f)  =  1  for  M  =  0  , 

and  this  combination  of  HP  orders  are  not  insufficient  to  overcome  the  f“3pole  in  L^f).  One  could 

increase  M  in  j  to  resolve  the  divergence  in  the  linear  model  (K^f)  oc  f2  or  higher),  but  this  will  not 

keep  the  PLL  from  cycle  slipping!  Thus,  one  can  see  that  arbitrarily  changing  the  error  measure  to 
eliminate  the  divergence  in  the  analysis  does  nothing  to  solve  the  actual  cycle-slipping  problem. 
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One  can  fix  the  real  problem  in  two  ways.  First,  one  can  change  the  design  and  eliminate  the  cycle  slips 
altogether.  This  is  accomplished  simply  by  changing  to  a  2nd-order  PLL,  for  which  |  Hs(f)  |2oc  f4  (f  « 1) 

[27].  Second,  one  can  allow  occasional  cycle  slips,  because  the  system  users  can  tolerate  them. 
However,  one  must  then  change  the  system  spec  so  the  phase  noise  without  the  slips  can  be  properly 
measured.  This  is  accomplished  by  specifying  that  is  to  be  measured  excluding  data  containing 

cycle  slips,  which  effectively  changes  K(J)_j(f) .  In  addition,  one  should  also  include  a  mean  time  to  cycle 

slip  requirement  in  the  spec  to  ensure  that  the  cycle  slips  don’t  become  a  nuisance.  An  example  of  a  non- 
essential  divergence  is  simply  the  failure  to  recognize  the  HP  filtering  due  to  causal  extraction  in  a2_j . 

RELATING  a?>M(x)  TO  aV  WHEN  va>M(t,A)  IS  A  POLYNOMIAL 

For  N  =  M  + 1  data  points,  Appendix  B  shows  that 

tfv-j  (N  =  M  + 1,  M)  =  cj2;M  (T  /  M)  (24) 

when  cjy_j(N,M)  uses  the  unweighted  “unbiased”  MS  ( =  (N  -  M)-1  =  1 )  and  vwM(t,A)  is  an  (M-l)th- 
order  polynomial.  For  M=l,  (24)  is  just  the  well-known  statement  that  the  Allan  variance  of  y(t)  is  the 
two-sample  variance  when  hs(t)  is  a  box-car  average  over  x  [13,23],  that  is  a2_j(2,l).  One  can  also 

analytically  demonstrate  that  (24)  is  true  for  the  M=2  case.  Thus,  the  Hadamard-Picinbono  variance  is 
equal  to  a2_j(3,2)  (when  hs(t)  is  a  box-car  average  over  x),  that  is,  when  frequency  offset  and  drift  are 

modeled  in  the  causal  behavior. 

As  shown  in  Figure  9,  one  can  extend  (24)  to  any  number  of  samples  by  investigating  the  behavior  of 
LSQF  simulations  as  N  is  varied  (while  T  and  M  remain  fixed).  In  the  figure,  we’ve  plotted  the  biased 
form  of  the  residual  error  deviate  indicated  by  RMSjvj}  (  =  N-1 )  versus  the  number  of  samples  N.  One 

can  see  from  the  figure  that  RMS{vj}  does  not  vary  much  as  N  increases  from  N  =  M  +  1  to  very  large 
values,  especially  for  neg-p  noise.  Using  this  approximate  invariability  of  RMS{vj}  as  N  is  changed,  one 
can  then  write 

CTv-j(N,M)  =  /M)  (any  N)  (25) 
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1 

0 
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| _ <Vw 
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Figure  9.  Errors  in  LSQF  residuals  as  N  is  varied  (fixed  T  and  M  =  2). 


We  also  note  that  one  can  obtain  exact  expressions  similar  to  (25)  for  any  specific  p  by  deriving  Mth-order 
“bias”  function  relationships  in  a  fashion  similar  to  those  derived  by  Allan  [19]  and  Barnes  [20]  for  the 
M  =  1  case.  One  should  note,  however,  for  M  =  1 ,  that  our  bias  functions  have  inherently  different 
behavior  than  those  of  Allan  and  Barnes.  This  is  because  we  fix  both  T  and  x  (T  =  Mx)  as  N  is  varied, 
while  Allan  and  Barnes  fix  x  as  N  is  varied  and  let  the  total  observation  interval  (Nx)  change  with  N 
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(Allan  and  Barnes  define  T  as  the  time  between  successive  tn  samples,  not  the  total  observation  interval.). 
Thus,  our  bias  functions  are  very  close  to  N/(N-M)  for  all  N  and  p,  unlike  the  Allan-Barnes  bias 
functions,  which  vary  widely  with  p. 

From  the  above,  one  can  see  that  g^m(T/M)  can  be  interpreted  as  a  measure  of  the  MS  residual  error 
(N,M)  for  any  N  when  vw  M(t,A)  is  an  (M-l)th -order  polynomial.  This  has  several  important 
consequences,  which  are  discussed  as  follows: 

(a)  gJm(t)  can  be  used  to  determine  a^_j(N,M)in  residual  error  problems  when  vwM(t,A)  is  an 
(M  -  l)th -order  polynomial.  Thus,  ct^m(t)  and  o^_j(N,M)  can  be  viewed  as  equivalent  error  measures 
for  these  problems. 

(b)  In  such  residual  error  problems,  there  is  a  great  advantage  in  using  o^m(T/M)  for  o^_j(N,M), 
because  one  need  not  perform  the  LQSF  and  remove  vw  M(t,A)  from  the  raw  data  in  order  to  generate 
av,M  (T  /M) .  This  is  because  of  the  well-known  insensitivity  of  ct^m(t)  to  (M-l)th-  or  lower-order 
polynomial  behavior  [14,25].  We  further  note  that  o^m(T/M)  without  vw  M(t,A)  removal  remains 
equivalent  to  g^_j(N,M)  with  such  removal  even  when  there  is  model  error.  This  is  because  model  error 
equally  effects  q^m(T/M)  and  a^_j(N,M). 

(c)  (25)  provides  guidance  about  which  orders  of  a^M(x)  are  appropriate  when  it  is  meant  to  measure 

purely  random  error.  The  interpretation  of  ct^m(t)  as  a  measure  of  o^_j(N,M)  shows,  if  one  wants 

oJM(T)  t°  measure  only  random  error  when  vwM(t,A)  can’t  follow  the  variations  in  vc(t)  over  T,  that 

one  must  remove  such  causal  behavior  first.  This  equivalence  of  g^m(t)  and  g^_j(N,M)  is  only  strictly 

true  only  when  x  =  T/M.  However,  for  x  decoupled  from  T/M,  one  can  assume  that  the  same 
approximate  sensitivity  applies.  Thus,  this  interpretation  explains  the  well-known  insensitivity  of  the 
Hadamard-Picinbono  variance  to  frequency  drift  and  the  sensitivity  of  the  Allan  variance  to  such  drift. 

(d)  Conventional  x-jitter  as  defined  in  [4-6]  (but  without  an  ad-hoc  fc)  is  equivalent  to  the  Hadamard- 
Picinbono  variance  of  x(t)  ( o^3  (x)  ). 


CONCLUSIONS 

We  have  demonstrated  that  a^M(x),  a^_j(N,M),  and  the  MS  jitter  (without  ad-hoc  filtering)  can  all  be 

viewed  as  essentially  measuring  the  same  type  of  error.  The  key  to  this  is  the  demonstration  of  three 
major  facts:  (a)  that  a  statistically  optimal  removal  of  the  causal  behavior  in  the  data  HP  filters  the  noise 
in  the  observable  residual  error,  (b)  that  the  order  of  the  noise  HP  filtering  is  a  function  of  the  complexity 
of  the  model  function  used  to  estimate  the  causal  behavior,  and  (c)  that  o^m(T/M)  is  a  measure  of 

ciy_j(N,M)  when  the  model  function  used  to  estimate  the  causal  behavior  is  an  (M-l)th -order 
polynomial. 
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Table  1  shows  the  consequences  of  interpreting  M  =  0,  1,2,  and  3  o^M(x)  and  a2  M(x)  as  j  and  a2_j 

with  aging  removed.  An  important  conclusion  shown  in  the  table  is  that  low-order  variances  such  as 
g^0(t),  0^(1),  and  are  appropriate  for  characterizing  “random”  error  in  coherent  frequency 

synthesizers  and  coherent  time  and  frequency  distribution  equipment  (excluding  the  frequency  reference). 
This  is  because  uncontrolled  time  or  frequency  offsets  that  are  fixed  over  the  measurement  interval  are 
part  of  the  “random”  error  that  must  be  considered  in  specifying  such  devices;  that  is,  these  devices  are 
not  supposed  to  have  such  uncontrolled,  but  fixed,  offsets.  On  the  other  hand,  such  fixed  offsets  are 
generally  not  considered  part  of  the  “random”  instabilities  in  oscillators,  where  higher-order  difference 
variances  are  used,  but  are  modeled  as  causal  error  or  compensated  for  using  PLLs  or  other  similar 
techniques.  Therefore,  the  precision  oscillator  community  uses  higher-order  difference  variances  as 
measures  of  “random”  error.  This  difference  in  application  explains  the  dichotomy  in  the  use  of  variances 
between  some  producers  of  time  and  frequency  distribution  equipment  [4]  and  producers  of  precision 
oscillators  [13]. 

As  a  final  note,  consider  the  difference  between  av_w  and  RMS{vj}  in  Figure  9.  The  simulations  in  this 
case  utilized  an  Hs(f)  with  fj  «  fT ,  so  the  theoretical  wander  is  effectively  divergent.  We  see  from  the 
figure  that  av_w  >  RMS{vj}  =  ov_ •  (for  large  N,  using  (25)).  Thus,  the  observable  error  is  underestimating 
the  true  function  error,  as  is  expected  from  correlated  LSQF  theory.  In  fact,  in  running  multiple  Figure  9 
simulation  sessions,  one  sees  av_w  vary  widely  from  run  to  run,  while  RMSjvj}  or  oY_-  remains  stable. 


Table  1.  Difference  variances  of  y(t)  and  x(t)  as  residual  error  variances  with  aging  removed. 


M 

A  Var  of  y 

Aging  Excluded 
Application 

A  Var  of  x 

Aging  Excl 
Application 

0 

MS{y} 

None 

Synthesizers  & 
Rel  time  dist  equip 

MS{x} 

None 

Abs  time  dist 
equip 

1 

Allan  y 

y  offset 

Oscillators  (y  drift 
in  instability) 

TIErms2/2 

MS{TIE}/2 

x  offset 
Synth  &  rel 
time  dist 

2 

Hadamard 

Picinbono 

y  ofs  &  drift 
Oscillators  (y  drift 
not  in  instability) 

Allan  x 
Jitter  2 
[28] 

x  &  y  offset 
Osc  (y  drift  in 
instab) 

3 

ax  32(t)  is  equivalent  to 

MS  of  x  or  TIE  jitter  with  time 
&  freq  offset  &  freq  drift 
removed 

Hadamard 

Picinbono 

x,y  ofs  y  drift 
Osc  (y  drift 
not  in  instab) 

Thus,  in  order  to  get  a  reliable  estimate  of  av_w  over  T,  one  must  take  data  over  periods  much  longer 
than  T  and  determine  the  power  law  structure  of  the  true  noise  vp(t)  (except  for  the  case  where  Hs(f) 
alone  guarantees  convergence  and  >  fT ).  Then  one  can  use  the  formulas  in  Appendix  A  to  estimate  the 
true  expectation  average  of  av_w  using  Lv(f)  accurately  determined  from  the  data  over  the  period  »  T. 
This  is  where  the  classic  techniques  involving  the  use  of  the  modified  Allan  variance  or  direct  spectral 
measurements  are  invaluable  [13].  Thus,  the  final  conclusion  of  this  paper  is  the  well-known  fact  in  the 
PTTI  community  that  one  must  perform  careful  analysis  and  measurements  over  periods  »  T  to 
determine  good  estimates  of  the  true  causal  function  error  or  true  residual  error  when  neg-p  noise  is 
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involved.  The  exception  for  this  is  again  the  case  where  Hs(f)  alone  guarantees  convergence  and  f ,  >  f T  . 
In  this  case,  the  wander  is  not  divergent  and  the  Hs(f)  is  such  that  the  estimate  of  the  causal  function 
from  data  over  T  is  a  good  one. 

APPENDIX  A.  DERIVATION  OF  THE  SPECTRAL  KERNELS 

In  this  section,  we  will  derive  the  kernels  Kv_j(f)  and  Kv_w(f)  for  a^_j  and  a^_w  and  a  dual-frequency 
kernel  Kv_c(fg,f)  for  a^_c.  For  generality,  we  will  let  v(t)  and  vc(t)  be  complex  and  the  weights  be 

arbitrary.  To  generate  these  kernels,  we  will  first  minimize  %2  the  sum  of  the  squares  weighted  by  the 
,  which  can  be  written  in  matrix  notation  as 

x2  =«  (v(t’)  *  -AtUt(f  ))(v(f )  -  U(t')A)  » .  (A.l) 

In  the  above: 

(a)  A1"  is  the  complex  conjugate  transpose  of  the  M-element  column  vector  A  =  (a0,alv..aM_1)'  ('  is  the 
transpose). 

(b)  U(f )  =  (u0(f  ),u1(f  ),...uM_1(f ))  is  an  M-element  row  vector  representing  the  M  basis  functions  um(t) 
for  vw  M(t,  A)  in  (6),  and  we  note  that  vw  M(t,  A)  in  this  notation  is 

vWfM(t,A)  =  U(t)A  .  (A.2) 

(c)  «. . .»  is  the  weighted  average  over  the  data  samples  (denoted  by  the  dummy  time  index  f ),  which, 
for  the  purposes  of  relating  the  LSQF  to  continuous  Fourier  transform  PSDs,  «. . .»  is  defined  as 

p+oo 

«  z(t’)  »=  dt’p(t')z(t’)  (A. 3 

J— GO 

where  the  density  function  p(t)  is  given  by 

N-l 

p(t)  =  2ynS(t-tn).  (A.4) 

n=0 

(d)  v(t) ,  vp(t) ,  and  vc(t)  are  all  assumed  to  be  filtered  by  the  system  response  function  hs(t)  as  in  (5). 

In  the  well-known  manner,  we  differentiate  y2  with  respect  to  A1"  to  obtain  the  following  LQSF  solution 


A  =  Q  «  U(t')tv(t')» 

(A.  5) 

vw,M(t,A)  =  U(t)Q  «  UCt^+vCf)  » 

(A.6) 

where 

Q  =«  (U(t')tU(t'))»_1 . 

(A.7) 

Using  (A.3)  and  (A.4), 

one  can  write  (A.  6)  as 

p+co  p+00 

vw>M(t,  A)  =  f  dt'gw  (t,  t')v(t’)  =  [df  Gw  (t,  f)V(f)Hs  (f) 

J— OO  J— 00 

(A.  8) 

573 


where:  (a)  the  Green’s  function  gw(t,tf)  is 

gw(t,t')  =  XJ(t)QU(t')tp(t’) .  (A.9) 

(b)Gw(t,f)  is  the  complex  conjugate  of  the  Wigner-Ville  spectrum  [29]  of  gw(t,f)  is 

i-|-00 

dteJ2lrf‘gw(t,t')  (A.  10) 

-oo 

and  (c)  V(f)Hs(f)  is  the  Fourier  transform  ofv(t)  is 

V(f)Hs(f)  =  3f>t{v(t)}  -  fdte_i2,lftv(t)  (A.l  1) 

J— 00 

Note  that  V(f)  alone  is  the  Fourier  transform  of  the  time  domain  variable  before  being  convoluted  with 
hs(t) ,  so  that  Hs(f)  is  explicitly  shown  in  the  right  side  of  (A.8). 

We  note  from  (A.2)  that  vw  M(t,  A)  is  linear  in  A  ,  and  thus  the  LSQF  solutions  (A.5)  and  (A.6)  are  linear 
in  v(t) .  Therefore,  we  can  write 

vw,M  (t,  A)  =  vwM  (t,  A(p) )  +  vWjM  (t,  A(c) ) 

where 

A(p)  =  Q  «  U(t’)t  vp(t')  »  A(c)  =  Q  «  U(t')tvc(t')  »  . 

Using  this  linear  separation,  (A.8),  and  (A.12),  we  can  write 

Vj  0)  =  vj,p  (t)  —  v  j,c  (t)  Vw  (t)  =  VW;P  (t)  +  V  j>c  (t) 
vw,p(t)  +  Vj>p(t)  =  Vw(t)  +  Vj(t)  =  Vp(t) 

where 

g j (t, t')  =  S(t-t')-gw(t,t')  Gj (t, f )  =  3_f  t, {g j (t, t')}  =ejrot  - Gw (t, f ) 

f  x  f+00  /»+GO 

vw,P  (0  =  vw,m  (t»  A<p) )  =  |dt'gw(t,t')vp(t')=  [df  G  w  (t,  f )  Vp  (f  )HS  (f ) 

J— 00  J— 00 

,  X  P+00  i*+GO 

vj,P  (0  =  VP  (t)  -  vw,m  0,  A  p  )  =  [  dt'g  j  (t,  t')vp  (t’>  =  f  df  Gj  (t,  f)V  (f  )HS  (f ) 

J— 00  tJ— 00 

p+oo  p+co 

v  j,c  (t)  =  vWjM  (t,  A(c) )  —  Vc  (t)  =  — T  dt'g  j  (t,  t’)vc  (t')  =  -  f  df  G  j  (t,  f  )VC  (f  )HS  (f ) 

and  where  Vp(f)  and  Vc(f)  are  defined  in  similar  fashion  as  the  definition  of  V(f)  in  (A.  11)  (that  is, 
before  hs(t)  is  applied).  We  note  from  (A.14)  and  (A.16)  that  vJC(t)  is  the  model  residual  and  vJp(t) 
and  vw  p(t)  are  the  jitter  and  wander  solely  due  to  vp(t) ,  which  add  together  to  produce  vp(t)  just  like  the 
total  jitter  and  wander  vj(t)  and  vw(t). 

By  taking  the  ensemble  average  E{...}  of  the  square  of  Vj(t)  and  vw(t)  from  (A.14)  and  (A.16)  and 
assuming  vp(t)  is  uncorrelated  with  vc(t) ,  one  can  write 


(A.12) 

(A.  13) 

(A.14.) 

(A.  15) 

(A.16) 
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E{|  Vj (t)  |2}  =  E{|  Vjp(t)  |2}  +  E{|  vj  c(t)  |2} 

E{|vw(t)|2}  =  E{|vw>p(t)|2}  +  E{|vJ>c(t)|2} 

where 

E{|  Vj  (t)  |2}  =  r  df  I  Gj  (t,f)Hs (f)  I2  Lv(f)  E{|  Vw>p(t)  |2}  =  r  df  I  Gw(t,f)Hs  (f)  |2  Lv(f) 

J— 00  J— 00 

/»+Q0  i*+00 

E{|  vj,c(t)  |2=  J  dfgj  ^ df  Lc (fg , f )G j  (t, f  +  0.5fg )G j  (t,f  —  0.5fg )H s  (f  +  0.5fg )HS (f  - 0.5fg ) 

with 

Lv(f)  =  3f>t{Rv(x)}  Lc(fg,f)  =  3fg>tg  {3f;t{Rc(tg,t)}} 

Rv(t)  =  E{vp(tg  +  x/2)v*(tg  -  t/2)}  Rc(tg,x)  =  E{vc(tg  +  x/2)v*(tg  -  x/2)} . 


(A.  17) 


(A.  18) 


(A.  19) 
(A.20) 


In  the  above,  is  Lc(fg,f)  is  the  rotated  Loeve  spectrum  [29,30]  of  vc(t)  given  by  the  double  Fourier 
transform  of  the  rotated  double  time  autocorrelation  function  Rc(tg,x) .  The  term  rotated  comes  from 

writing  the  lag  function  vc  (tj  )v*  (t2 )  in  terms  of  the  “rotationally”  transformed  global  time 
tg  =  0.5(t!  + 12)  and  local  or  differential  time  x  =  (t,  - 1, ) .  When  v(t)  is  real,  we  note  that 


Rv(-x)  =  Rv(x)  Rc(tg  ,-x)  =  Rc(tg,x) 

Lv (f )  =  Lv ( — f )  =  L*v (f )  Lc(fg,f)*  =  Lc(-fg,f)  Lc (fg ,-f )  =  Lc (fg , f )  NO  real] 

G  w  (t,f  )*  =  Gw(t,-f)  Gj  (t,  f  )*  =Gj(t,-f) 

If  we  now  integrate  (A.  1 8)  over  t  from  -oo  to  oo ,  we  obtain  our  final  result 

a2_j>p  =  f  df  Kt_j(f)  |  Hs  (f)  |2  Lp(f)  a2_j  =  o2_jiP  +  o2_c 

a2_WiP  =  [dfKv_w(f)|Hs(f)|2  Lp(f)  av-w  =  av-w,p  +®v-o 

J— 00 

P+CO  m+cc 

a2_c  =  f  dfgf  df  Kv_c(fg,f)Hs(f  +  0.5fg)Hs(f -0.5fg)  Lc(fg,f) 

J— 00  J— 00 

Where  our  kernels  are 

Kv_j(f)  =  J*  |  Gj  (t,f)  |2  Kv_w(f)  =  |  Gw(t,f)  |2 

f+0°  * 

Kv_c  (fg ,  f )  =  J  dt  G  j  (t,  f  +  0.5fg  )G  j  (t,  f  -  0.5fg  ) 


(A.21) 


(A.22) 


(A.23) 


APPENDIX  B.  VERIFYING  THAT  oJ_j(M  +  l)  =  aJ>M(T/M) 

In  this  appendix,  we  will  verify  that  (25)  is  true  when  a2_j  is  the  “unbiased”  uniformly  weighted  residual 
variance  ( =  (N  -  M )  1 ).  To  do  this,  we  will  show  that  the  stronger  assertion 

vj(tn)=  [A(T/M)m  v(t0)]c(M,n)  WM  (B.l) 
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is  true  by  Monte  Carlo  simulation.  One  can  show  (B.l)  guarantees  (25)  by  using  (12)  and  (B.l)  as 
follows 


M  M 

<j(M  +  l)  =  £  |Vj(tn)|2=|A(T/M)Mv(t0)|2^M-2^  |c(M,n)|2 

n=0  n=0  (B.-2) 

=  XM-‘  I  A(T/M)m v(t0)  |2=  g2,m(T/M) 

(B.l)  has  been  verified  by  Matlab  simulation  for  multiple  random  data  sets  (runs)  up  toM  =  18  .  Above 
this  M  value,  the  Matlab  LSQF  code  used  ran  into  precision  difficulties  in  the  computation  of  vw  M(t,  A) . 

An  example  Matlab  simulation  for  M  =  5  is  shown  in  Table  B-l.  Thus,  one  can  say  that  (25)  is  true  for 
any  M  with  an  extremely  high  confidence  because  of  the  detailed  nature  of  assertion  (B.l). 

To  understand  analytically  why  (B.l)  is  true,  let  us  reformulate  the  notation  of  Appendix  A  into  one  more 
suited  for  an  unweighted  LSQF.  Let  us  define  the  following  N  =  M  + 1  element  column  vectors 
(n  =  0  to  M):  Cn  =  c(M,n) ,  Vn  =  v(tn) ,  VJ?n  =  Vj(tn) ,  and  Vw  =  vw?M(tn,A)  =  UA  ,  where  Un?m  =  t”  (M+l 
rows  for  index  n  and  M  columns  for  index  m  =  0toM-l).  We  also  note  that  (12)  can  be  written  as 
A(T/M)m v(t0)  =  CV  and  (B.l)  becomes 

Vj  =  V-Vw  =  CCtV/XM  (B.3) 

where  in  (14)  becomes 


Table  B-l .  Monte  Carlo  verification  of  (B.l)  for  M  =  5. 


Run  -> 

1 

2 

3 

4 

5 

6 

A(T/M)Mv(t0) 

-4.3818 

-20.6668 

-15.3472 

-11.062 

-3.9762 

7.6004 

AMVj(tn)/c(M,n) 

to 

-4.3818 

-20.6668 

-15.3472 

-11.062 

-3.9762 

7.6004 

tl 

-4.3818 

-20.6668 

-15.3472 

-11.062 

-3.9762 

7.6004 

t2 

-4.3818 

-20.6668 

-15.3472 

-11.062 

-3.9762 

7.6004 

t3 

-4.3818 

-20.6668 

-15.3472 

-11.062 

-3.9762 

7.6004 

u 

-4.3818 

-20.6668 

-15.3472 

-11.062 

-3.9762 

7.6004 

t5 

-4.3818 

-20.6668 

-15.3472 

-11.062 

-3.9762 

7.6004 

^M=CfC 

and  x2  from  (A.  1)  becomes 

x2  =  v.fy.  =  (yt  _  a^XV  -  UA) . 

Regenerating  the  LSQF  solution  by  taking  the  derivative  of  (B.5)  with  respect  to  A  ,  we  obtain 

UfVj  =  Uf(V  -  UA)  =  0 

which  yields  the  unweighted  LSQF  solution 


(B.4) 

(B.5) 

(B.6) 


576 


39th  Annual  Precise  Time  and  Time  Interval  (PTTI)  Meeting 


Vw  =  UQUtV 


where  Q  1  =  l^U  . 


(B.7) 


If  we  insert  (B.3)  into  (B.6),  we  obtain 

VtCCt\  =  0  (B.8) 

and  note  that  this  must  be  true  as  a  necessary  condition  for  (B.3)  to  be  the  LSQF  solution.  To  show  this, 
we  note  from  (12)  that  U^C  =  0  as  follows 

M+l  M-l 

UfC  =  ^Unimc(M,n)  =  ^c(M,n)t”-/  =  A(t)m  C'1  =  0  (B.9) 

n=l  n=l 

because  the  Mth-order  difference  of  an  (M-l)th-order  or  less  power  of  t  is  zero  [14].  Thus,  (B.3)  satisfies 
this  necessary  condition  with  any  arbitrary  kM  .  Multiplying  (B.6)  by  A1"  yields  the  well-known 
orthogonality  principle  for  an  LSQF,  which  states  that  the  residual  error  Vj  is  orthogonal  to  the  estimated 
function  Vw.  Thus,  (B.9)  states  that  C  is  also  orthogonal  to  Vw,  which  explains  why  (B.3)  is  an  LSQF 

solution,  since  CC^V  in  (B.3)  is  the  projection  of  V  in  the  C  direction.  From  (B.3)  and  (B.7),  we  also 
note  that  we  can  write 

UQU1  ^I-CC1/^  (B.10) 


which  is  useful  in  simplifying  the  calculations  for  the  spectral  kernels  in  Appendix  A  for  the  unweighted 
N  =  M  + 1  LSQF  case. 

In  trying  to  analytically  prove  that  =  C^C  must  be  the  unique  LQSF  solution,  some  mathematical 
difficulties  were  encountered  by  the  author.  These  must  be  resolved  before  developing  a  general 
analytical  proof  of  (B.3).  Therefore,  for  now,  one  must  rely  on  Monte  Carlo  simulations  to  verify  that 
(B.l)  or  (B.3)  is  true. 
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